In the sequel, we denote by \(F_{t}\) the cumulative distribution function for year \(t\). We agree on \(\overline{F}_t = 1 - F_t\) and \(F_t(-1)=0\). Henceforth, \(\overline{F}\) is called the survival function.
qx
(age-specific) risk of death at age \(x\), or mortality quotient at given age \(x\) for given year \(t\).
About the definition of \(q_{t,x}\)
Defining and computing \(q_{t,x}\) does not boil down to knowing the number of people at age \(x\) at the beginning of ear \(t\) and knowing how many of them died during year \(t\). If we want to be rigorous, we need to know all life lines in the Lexis diagram, or equivalently, how many people at Age \(x\) were alive on each day of Year \(t\).
Mortality quotients define a probability distribution
For a given year \(t\), the sequence of mortality quotients define a survival function \(\overline{F}_t\) using the following recursion:
\[q_{t,x} = \frac{\overline{F}_t(x) - \overline{F}_t(x+1)}{\overline{F}_t(x)}\] with boundary condition \(\overline{F}_t(-1) =1\).
This artificial probability distribution is used to define and compute life expectancies.
\(q_{t,x}\) is the hazard rate of \(\overline{F}_t\) at age \(x\).
ex:
Residual Life Expectancy at age \(x\) and year \(t\)
This is the expectation of \(X -x\) for a random variable \(X\) distributed according to \(\overline{F}_t\) conditionnally on the event \(\{ X \geq x \}\). That is \(e_{t,x}\) is the expectation of the probability distribution defined by \(\overline{F}_t(\cdot + x-1)/\overline{F}_t(x-1)\).
Rearrangement
Question
From dataframe life_table, compute another dataframe called life_table_pivot with primary key Country, Gender and Year, with a column for each Age from 0 up to 110. For each age column, the entry should be the central death rate at the age defined by column, for Country, Gender and Year identifying the row.
You may use functions pivot_wider, pivot_longer from tidyr:: package.
gt_1<-small_lex_table|>gt()|>tab_header( title ="Computed Life Expectancies at birth", subtitle ="a sample")
Code
gt_1
Life expectancy and window functions
Question
Write a function that takes as input a vector of mortality quotients, as well as an age, and returns the residual life expectancy corresponding to the vector and the given age.
Write a function that takes as input a dataframe with the same schema as life_table and returns a data frame with columns Country, Gender, Year, Age defining a primary key and a column res_lex containing residual life expectancy corresponding to the pimary key.
In order to compute residual life expectancies, you may consider using window functions over apropriately defined windows. The next window function suffices to compute life expectancy at birth. It computes the logarithm of survival probabilities for each Country, Year, Gender (partition) at each Age. Note that the expression mentions an aggregation function sum and that the correction of the result is ensured by a correct design of the frame argument.
Code
df<-life_table|>select(Country, Gender, Year, Age, qx)|>group_by(Country, Year, Gender)|>arrange(Age)|>mutate(sx =cumprod(1-qx))# window function
gt_2<-df_leb|>sample_n(size =20)|>arrange(Year)|>gt()|>tab_header( title ="Computed Life Expectancies at birth", subtitle ="a sample")
Code
gt_2
Computation of Life Expectancy at birth boils down to compute survival probabilities using one window function, grouping with respect to country, gender and year, ordering by age and summing up survival probabilities.
Question
Compute residal life expectancies at all ages using window functions
In order to compute Residual Life Expectancies at all ages, instead of performing aggregation, we compute a second window function.
The window is more sophisticated than the previous one, we still need partitioning by Year, Country and Gender, ordering by Age, but we also need to sum over conditional survival probabilities, which are just ratios of survival probabilities, but over a frame defined by the current Age and all ages above.
df_ex|>sample_n(size =10)|>select(-sx, -qx, Country, Year, Gender, Age, ex)|>arrange(Year)|>gt()|>tab_header( title ="Computed Life Expectancies at all ages using slide_dbl()", subtitle ="a sample")
This is slow.
Computing residual life expectancies using window functions and accumulate
The official calculation of residual life expectancies assumes that except at age \(0\) and great age, people die uniformly at random between age \(x\) and \(x+1\): \[
e_{t,x} = (1- q_{t,x}) \times (1 + e_{t,x+1}) + \frac{1}{2} \times q_{t,x}
\]
This recursion suggests a more efficient to compute residual life expectancies at all ages.
Indeed, purrr::accumulate() allows to compute all values for \(e_{t,x}\) using exactly one pass over the table.
df_rle<-life_table|>select(Country, Gender, Year, Age, qx)|>group_by(Country, Year, Gender)|>arrange(desc(Age))|>mutate(`Residual Life Expectancy`=rlex(qx))|>ungroup()
Code
# tbl_ResLifeExpectancy |>#fn_1<-function(df, up_a=10, g='Female', y=2016){df|>filter(Age<up_a, Gender==g, Year==y)|>select(Country, Age, `Residual Life Expectancy`)|>arrange(Age)|>pivot_wider( names_from=Age, values_from=`Residual Life Expectancy`)|>gt()|>tab_header( title ="Computed Life Expectancies at different ages", subtitle =glue::glue("{g} under {up_a} for Year {y}"))|>fmt_engineering(columns=-Country)|>gt::tab_spanner(label ="Age", columns =seq(2,1+up_a))}
Code
gt_3<-fn_1(df_rle)
Code
gt_3
Question
Compute and display residual life expectancies for ages \(0\) to \(9\) for year \(1972\)
Code
gt_3_b<-fn_1(df_rle, y=1972)
Code
gt_3_b
Note that for year \(1972\), except in the Netherlands and in Sweeden, for girls, residual life expectancies at age \(0\) are slightly lower than residual life expectancies at age \(1\).
Is it a surprise?
Question
Plot residual life expectancy as a function of Year at ages \(60\) and \(65\), facet by Gender and Country.
df_tmp|>ggplot()+aes(x=Year, y=ex, group=Age, color=as_factor(Age))+labs(color="Age")+ylab("Residual life expectancy")+geom_line()+facet_grid(rows=vars(Country), cols=vars(Gender))+ggtitle("Evolution of residual life expectancy at 60 and 65")
Comment.
Except in Spain, Residual Life Expectancies started to take off late, after 1970.
{df_pension|>ggplot(aes(frame=Year, y=ex, x=Age, linetype=Country, col=Country))+labs(linetype="Country", col="Country")+ylab("Residual life expectancy")+geom_line(size=.2)+facet_grid(cols=vars(Gender))+ggtitle("Residual Life Expectancy takes off")}|>ggplotly()
Code
{df_pension|>ggplot(aes(frame=Year, y=Age+ex, x=Age, linetype=Country, col=Country))+labs(linetype="Country", col="Country")+ylab("Conditional expected age at death")+ylim(c(60,100))+geom_line(size=.2)+facet_grid(cols=vars(Gender))}|>ggplotly()